Search CORE

66 research outputs found

Pure Exploration with Multiple Correct Answers

Author: Degenne R.R.B.P. (Rémy)
Koolen-Wijkstra W.M. (Wouter)
Publication venue
Publication date: 01/01/2019
Field of study

We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensure that the existing Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound

CWI's Institutional Repository

MetaGrad: multiple learning rates in online learning

Author: Erven T.A.L. (Tim) van
Koolen-Wijkstra W.M. (Wouter)
Publication venue
Publication date: 05/12/2016
Field of study

CWI's Institutional Repository

Online isotonic regression

Author: Koolen-Wijkstra W.M. (Wouter)
Kotlowski W.T. (Wojciech)
Malek A. (Alan)
Publication venue
Publication date: 06/06/2016
Field of study

CWI's Institutional Repository

Random permutation online isotonic regression

Author: Koolen-Wijkstra W.M. (Wouter)
Kotłowski W. (Wojciech)
Malek A. (Alan)
Publication venue
Publication date: 04/12/2017
Field of study

We revisit isotonic regression on linear orders, the problem of fitting monotonic functions to best explain the data, in an online setting. It was previously shown that online isotonic regression is unlearnable in a fully adversarial model, which lead to its study in the fixed design model. Here, we instead develop the more practical random permutation model. We show that the regret is bounded above by the excess leave-one-out loss for which we develop efficient algorithms and matching lower bounds. We also analyze the class of simple and popular forward algorithms and recommend where to look for algorithms for online isotonic regression on partial orders

CWI's Institutional Repository

Non-Asymptotic Pure Exploration by Solving Games

Author: Degenne R.R.B.P. (Rémy)
Koolen-Wijkstra W.M. (Wouter)
Ménard P. (Pierre)
Publication venue
Publication date: 01/01/2019
Field of study

Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistake

CWI's Institutional Repository

Lipschitz and comparator-norm adaptivity in online learning

Author: Koolen-Wijkstra W.M. (Wouter)
Mhammedi Z. (Zakaria)
Publication venue
Publication date: 27/02/2020
Field of study

We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients separately using O(d) time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time O(d3) per round. We then generalize two prior reductions t

CWI's Institutional Repository

Mixture martingales revisited with applications to sequential tests and confidence intervals

Author: Kaufmann E. (Emilie)
Koolen-Wijkstra W.M. (Wouter)
Publication venue
Publication date: 12/11/2021
Field of study

This paper presents new deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model. The deviations are measured using the Kullback-Leibler divergence in a given one-dimensional exponential family, and take into account multiple arms at a time. They are obtained by constructing for each arm a mixture martingale based on a hierarchical prior, and by multiplying those martingales. Our deviation inequalities allow us to analyze stopping rules based on generalized likelihood ratios for a large class of sequential identification problems. We establish asymptotic optimality of sequential tests generalising the track-and-stop method to problems beyond best arm identification. We further derive sharper stopping thresholds, where the number of arms is replaced by the newly introduced pure exploration problem rank. We construct tight confidence intervals for linear functions and minima/maxima of the vector of arm means

CWI's Institutional Repository

Mixture martingales revisited with applications to sequential tests and confidence intervals

Author: Kaufmann E. (Emilie)
Koolen-Wijkstra W.M. (Wouter)
Publication venue
Publication date: 08/12/2021
Field of study

CWI's Institutional Repository

Luckiness in multiscale online learning

Author: Koolen-Wijkstra W.M. (Wouter)
Pérez M.F. (Muriel)
Publication venue
Publication date: 28/11/2022
Field of study

Algorithms for full-information online learning are classically tuned to minimize their worst-case regret. Modern algorithms additionally provide tighter guarantees outside the adversarial regime, most notably in the form of constant pseudoregret bounds under statistical margin assumptions. We investigate the multiscale extension of the problem where the loss ranges of the experts are vastly different. Here, the regret with respect to each expert needs to scale with its range, instead of the maximum overall range. We develop new multiscale algorithms, tuning schemes and analysis techniques to show that worst-case robustness and adaptation to easy data can be combined at a negligible cost. We further develop an extension with optimism and apply it to solve multiscale two-player zero-sum games. We demonstrate experimentally the superior performance of our scale-adaptive algorithm and discuss the subtle relationship of our results to Freund’s 2016 open problem

CWI's Institutional Repository

Lipschitz and comparator-norm adaptivity in online learning

Author: Koolen-Wijkstra W.M. (Wouter)
Mhammedi Z. (Zakaria)
Publication venue
Publication date: 09/07/2020
Field of study

O(d)

time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time

O(d^3)

per round. We then generalize two prior reducti

CWI's Institutional Repository